The dataset contains up to 27000 bigbasket products. Bigbasket is the largest retail store in India. This notebook performs an analysis explore insights in the dataset.

Little info about the dataset is shown in the output above and we can see columns like the category, the product and even a sub category which further helps to classify the produc.We can also see the market price which is the price the product is being sold at big product and the market price which is the price the product is being sold outside of bigproduct. The ratings and the description both tell us how the customers feel about each product.

The first thing we do is check for missing values in our dataset. We can see below that not too much data is actually missing, there is one product each missing from the brand and product columns and the rating column is missing 8626 values, description also with 115 descriptions.

Time has come to decide how to treat these missing values, first is to check for the index of the rows that contain the product and brand missing values and then drop them by index.

For the brand, the missing value falls at the row with index of 9766 while the missing value for the product falls at 14 the row with index of 14364.

To resolve the issue for the missing rating, I'm just gonna drop the rows where the ratings are missing too, dropping the features column means removing an important feature in my dataset and I can't seem to think of any values to impute in because the ratings come from what the customer feels when they use the product< so the best option I think is to just drop those rows.

So, I'm just going to leave the desriptions column that way it might not be very useful until much later.

EDA

We begin the EDA by checking for which categories are the most popular. The histogram plot shows the count of each unique categories which are 9 in total.

The cell below shows all the categories

The next visualization shows plots the average price for each category.

Here, we can see that the prices for Kitchen, Garden and Pets category are the highest and the Snacks & branded foods have the lowest average prices of all the categories, this is probably true for the rating too, but we can't know except we check(that is going to be checked later)

Here, we are visualizing the average sale price of each catgory.

Hovering on top of each bar shows you the average sale price that has been plotted. we can see its not the Kitchen, Garden and pets catgory as it was before, the baby care category comes at the top this time.

Next, we're comparing the market and sale prices for each category and see how much they differ from each other.

There isn't much difference between the prices for most of the categories now except for the Kitchen, Garden and pets category, there is an approximately 26% difference in the ale price and market price with the market price being the higher price. If there was enough data, we'd have probably been able to check whether the store was running at a loss or not.

There isn't really much difference as we can see in the plot, the averge rating for the categories is approximately 4. The highest being beverages at 4.08 and Kitchen, Garden & Pets as the lowest at 3.73.

I wonder how Kitchen, Garden & Pets is the lowest though, considering that it has the highest market_price to sale_price ratio.

The next bar plot will rank the top 10 most expensive products at bigbasket

It is seen that the beauty and hygiene category has the top 5 most expensive products with perfumes ranking in the top 4 and a protein supplement product at 5th place, next we have 3 products from Kitchen, Garden AND Pets and the last 2 come from Gourmet and world food.

Now with the least expensive products

A product from beauty and hygiene seems like an outlier here because the rest come from snacks and world food categories.

Now, for the final part let's create a scatter plot of the market_price and sale_price and check out what relationship exist between them.

As expected, there is an almost perfect positive correlation between them. an increase in the market_price is accompanied by an obvious increase in the sale_price

With this, we come to the end of the EDA. Suggest visualizations you think should be added, thank you.